28 research outputs found

    De Novo Assembly of Nucleotide Sequences in a Compressed Feature Space

    Get PDF
    Sequencing technologies allow for an in-depth analysis of biological species but the size of the generated datasets introduce a number of analytical challenges. Recently, we demonstrated the application of numerical sequence representations and data transformations for the alignment of short reads to a reference genome. Here, we expand out approach for de novo assembly of short reads. Our results demonstrate that highly compressed data can encapsulate the signal suffi- ciently to accurately assemble reads to big contigs or complete genomes

    A Method for Comparing Multivariate Time Series with Different Dimensions

    Get PDF
    In many situations it is desirable to compare dynamical systems based on their behavior. Similarity of behavior often implies similarity of internal mechanisms or dependency on common extrinsic factors. While there are widely used methods for comparing univariate time series, most dynamical systems are characterized by multivariate time series. Yet, comparison of multivariate time series has been limited to cases where they share a common dimensionality. A semi-metric is a distance function that has the properties of non-negativity, symmetry and reflexivity, but not sub-additivity. Here we develop a semi-metric – SMETS – that can be used for comparing groups of time series that may have different dimensions. To demonstrate its utility, the method is applied to dynamic models of biochemical networks and to portfolios of shares. The former is an example of a case where the dependencies between system variables are known, while in the latter the system is treated (and behaves) as a black box

    Local Binary Patterns as a Feature Descriptor in Alignment-free Visualisation of Metagenomic Data

    Get PDF
    Shotgun sequencing has facilitated the analysis of complex microbial communities. However, clustering and visualising these communities without prior taxonomic information is a major challenge. Feature descriptor methods can be utilised to extract these taxonomic relations from the data. Here, we present a novel approach consisting of local binary patterns (LBP) coupled with randomised singular value decomposition (RSVD) and Barnes-Hut t-stochastic neighbor embedding (BH-tSNE) to highlight the underlying taxonomic structure of the metagenomic data. The effectiveness of our approach is demonstrated using several simulated and a real metagenomic datasets

    Marginalised stack denoising autoencoders for metagenomic data binning

    Get PDF
    Shotgun sequencing has facilitated the analysis of complex microbial communities. Recently we have shown how local binary patterns (LBP) from image processing can be used to analyse the sequenced samples. LBP codes represent the data in a sparse high dimensional space. To improve the performance of our pipeline, marginalised stacked autoencoders are used here to learn frequent LBP codes and map the high dimensional space to a lower dimension dense space. We demonstrate its performance using both low and high complexity simulated metagenomic data and compare the performance of our method with several existing techniques including principal component analysis (PCA) in the dimension reduction step and fc-mer frequency in feature extraction step

    The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences.

    Get PDF
    Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data

    Respiratory eukaryotic virome expansion and bacteriophage deficiency characterize childhood asthma

    Full text link
    Asthma development and exacerbation is linked to respiratory virus infections. There is limited information regarding the presence of viruses during non-exacerbation/infection periods. We investigated the nasopharyngeal/nasal virome during a period of asymptomatic state, in a subset of 21 healthy and 35 asthmatic preschool children from the Predicta cohort. Using metagenomics, we described the virome ecology and the cross-species interactions within the microbiome. The virome was dominated by eukaryotic viruses, while prokaryotic viruses (bacteriophages) were independently observed with low abundance. Rhinovirus B species consistently dominated the virome in asthma. Anelloviridae were the most abundant and rich family in both health and asthma. However, their richness and alpha diversity were increased in asthma, along with the co-occurrence of different Anellovirus genera. Bacteriophages were richer and more diverse in healthy individuals. Unsupervised clustering identified three virome profiles that were correlated to asthma severity and control and were independent of treatment, suggesting a link between the respiratory virome and asthma. Finally, we observed different cross-species ecological associations in the healthy versus the asthmatic virus-bacterial interactome, and an expanded interactome of eukaryotic viruses in asthma. Upper respiratory virome "dysbiosis" appears to be a novel feature of pre-school asthma during asymptomatic/non-infectious states and merits further investigation

    Investigation of Salmonella Phage–Bacteria Infection Profiles: Network Structure Reveals a Gradient of Target-Range from Generalist to Specialist Phage Clones in Nested Subsets

    Get PDF
    From MDPI via Jisc Publications RouterHistory: accepted 2021-06-23, pub-electronic 2021-06-28Publication status: PublishedFunder: Horizon 2020 Framework Programme; Grant(s): 767015Funder: The International Science and Technology Center; Grant(s): ISTC grant A-2140Bacteriophages that lyse Salmonella enterica are potential tools to target and control Salmonella infections. Investigating the host range of Salmonella phages is a key to understand their impact on bacterial ecology, coevolution and inform their use in intervention strategies. Virus–host infection networks have been used to characterize the “predator–prey” interactions between phages and bacteria and provide insights into host range and specificity. Here, we characterize the target-range and infection profiles of 13 Salmonella phage clones against a diverse set of 141 Salmonella strains. The environmental source and taxonomy contributed to the observed infection profiles, and genetically proximal phages shared similar infection profiles. Using in vitro infection data, we analyzed the structure of the Salmonella phage–bacteria infection network. The network has a non-random nested organization and weak modularity suggesting a gradient of target-range from generalist to specialist species with nested subsets, which are also observed within and across the different phage infection profile groups. Our results have implications for our understanding of the coevolutionary mechanisms shaping the ecological interactions between Salmonella phages and their bacterial hosts and can inform strategies for targeting Salmonella enterica with specific phage preparations

    Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes

    Get PDF
    Black women across the African diaspora experience more aggressive breast cancer with higher mortality rates than white women of European ancestry. Although inter-ethnic germline variation is known, differential somatic evolution has not been investigated in detail. Analysis of deep whole genomes of 97 breast cancers, with RNA-seq in a subset, from women in Nigeria in comparison with The Cancer Genome Atlas (n = 76) reveal a higher rate of genomic instability and increased intra-tumoral heterogeneity as well as a unique genomic subtype defined by early clonal GATA3 mutations with a 10.5-year younger age at diagnosis. We also find non-coding mutations in bona fide drivers (ZNF217 and SYPL1) and a previously unreported INDEL signature strongly associated with African ancestry proportion, underscoring the need to expand inclusion of diverse populations in biomedical research. Finally, we demonstrate that characterizing tumors for homologous recombination deficiency has significant clinical relevance in stratifying patients for potentially life-saving therapies

    Whole-genome analysis of Nigerian patients with breast cancer reveals ethnic-driven somatic evolution and distinct genomic subtypes

    Get PDF
    From Springer Nature via Jisc Publications RouterHistory: received 2020-12-12, accepted 2021-11-02, registration 2021-11-04, pub-electronic 2021-11-26, online 2021-11-26, collection 2021-12Publication status: PublishedFunder: Postdoctoral Research Fellowship P2BSP3_178591Funder: Francis Crick Institute (Francis Crick Institute Limited); doi: https://doi.org/10.13039/100010438Funder: Cancer Research UK (CRUK); doi: https://doi.org/10.13039/501100000289; Grant(s): FC001202Funder: Wellcome Trust (Wellcome); doi: https://doi.org/10.13039/100004440; Grant(s): FC001202Funder: U.S. Department of Health & Human Services | National Institutes of Health (NIH); doi: https://doi.org/10.13039/100000002; Grant(s): U01 CA161032, U01 CA161032, R01 MD013452, R01 CA228198, U01 CA161032, R01 MD013452, P20-CA233307Funder: U.S. Department of Health & Human Services | National Institutes of Health (NIH)Funder: Breast Cancer Research Foundation (BCRF); doi: https://doi.org/10.13039/100001006; Grant(s): BCRF-20-071, BCRF-19-120Funder: DH | National Institute for Health Research (NIHR); doi: https://doi.org/10.13039/501100000272; Grant(s): 203141/Z/16/ZFunder: Susan G. Komen (Susan G. Komen Breast Cancer Foundation); doi: https://doi.org/10.13039/100009634; Grant(s): SAC110026, SAC210203Funder: American Cancer Society (American Cancer Society, Inc.); doi: https://doi.org/10.13039/100000048Abstract: Black women across the African diaspora experience more aggressive breast cancer with higher mortality rates than white women of European ancestry. Although inter-ethnic germline variation is known, differential somatic evolution has not been investigated in detail. Analysis of deep whole genomes of 97 breast cancers, with RNA-seq in a subset, from women in Nigeria in comparison with The Cancer Genome Atlas (n = 76) reveal a higher rate of genomic instability and increased intra-tumoral heterogeneity as well as a unique genomic subtype defined by early clonal GATA3 mutations with a 10.5-year younger age at diagnosis. We also find non-coding mutations in bona fide drivers (ZNF217 and SYPL1) and a previously unreported INDEL signature strongly associated with African ancestry proportion, underscoring the need to expand inclusion of diverse populations in biomedical research. Finally, we demonstrate that characterizing tumors for homologous recombination deficiency has significant clinical relevance in stratifying patients for potentially life-saving therapies
    corecore